Best Subset Selection for Eliminating Multicollinearity

نویسندگان

  • Ryuta Tamura
  • Ken Kobayashi
  • Yuichi Takano
  • Ryuhei Miyashiro
  • Kazuhide Nakata
  • Tomomi Matsui
چکیده

This paper proposes a method for eliminating multicollinearity from linear regression models. Specifically, we select the best subset of explanatory variables subject to the upper bound on the condition number of the correlation matrix of selected variables. We first develop a cutting plane algorithm that, to approximate the condition number constraint, iteratively appends valid inequalities to the mixed integer quadratic optimization problem. We also devise mixed integer semidefinite optimization formulations for best subset selection under the condition number constraint. Computational results demonstrate that our cutting plane algorithm frequently provides solutions of better quality than those obtained using local search algorithms for subset selection. Additionally, subset selection by means of our optimization formulations succeeds when the number of candidate explanatory variables is small.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effects of Multicollinearity in All Possible Mixed Model Selection

The effects of multicollinearity in all possible model selection of fixed effects including quadratic and cross products in the presence of random and repeated measures effects are presented here. The user-friendly SAS macro application ALLMIXED2 complements the model selection option currently available in the SAS macro applications ‘REGDIAG’ and ‘LOGISTIC’ for multiple linear and logistic reg...

متن کامل

A non-linear data mining parameter selection algorithm for continuous variables

In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transforma...

متن کامل

A New Hybrid Feature Subset Selection Algorithm for the Analysis of Ovarian Cancer Data Using Laser Mass Spectrum

Introduction: Amajor problem in the treatment of cancer is the lack of an appropriate method for the early diagnosis of the disease. The chemical reaction within an organ may be reflected in the form of proteomic patterns in the serum, sputum, or urine. Laser mass spectrometry is a valuable tool for extracting the proteomic patterns from biological samples. A major challenge in extracting such ...

متن کامل

Correlated Component Regression: Re-thinking Regression in the Presence of Near Collinearity

We introduce a new regression method – called Correlated Component Regression (CCR) – which provides reliable predictions even with near multicollinear data. Near multicollinearity occurs when a large number of correlated predictors and relatively small sample size exists as well as situations involving a relatively small number of correlated predictors. Different variants of CCR are tailored t...

متن کامل

An Overview of the New Feature Selection Methods in Finite Mixture of Regression Models

Variable (feature) selection has attracted much attention in contemporary statistical learning and recent scientific research. This is mainly due to the rapid advancement in modern technology that allows scientists to collect data of unprecedented size and complexity. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a sma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016